-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use one RNG per Event Producer #192
Conversation
Nodes and activity are not included here, because we want to be able to create our config struct separately from parsing of nodes and activity. This reasoning will become more apparent in future commits when we add two different running modes for the simulator (regular operation and simulated network mode).
Preparation for further refactoring where we'll remove the mutex.
@enigbe while using the simulator with fixed seed I noticed that the variance issues that you pointed out get much worse with larger networks (makes sense, there are more events taking place so we have more variance). I should have thought about this option during review when you had the original version with one rng per producer (as I've done here), sorry! Took me running this to get more of an intuition into where we need to fix things up. |
Please don't apologize. It was fine implementing it in the way you suggested (helps me learn more about the project). I will review this today/tomorrow. |
sim-lib/src/lib.rs
Outdated
// generator will get the same set of values across runs (with a shared RNG, the order in | ||
// which tasks access the shared RNG would change the value that each generator gets). We | ||
// "salt" the set seed with a value based on the node's public key so that each generator will | ||
// start with a different seed, but it will be the same across runs. It is not critical is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be if
. (... not critical if ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from a single nitpick this looks fine. Happy to take another look when you think this is ready.
As-is, we are sharing a single RNG for all of our random operations - destination selection for each payment, and selection of payment parameters per-sending node. All of these operations may complete in in different orders every time we run (be it because a task gets access to a mutex in a different order, or a payment is faster or slower to dispatch on the actual underlying node). This leads to large variance in our supposedly deterministic run, because the values we get from our "fixed" seed are different based on this order of operations. To address this, we use separate RNGs per-task, which reduces the variety in order of operations impact on the values that each task gets from its RNG. We can't do this for network_generator (because we use one for all of our tasks), so we take the approach of passing a single RNG into both our Payment and Network generator. This is admittedly a little clunky API-wise, IMO a great improvement in the feature. Testing this change against the original approach with: - 50x runs of the simulator - Sum up total revenue for a target node per run - Look at the coefficient of variance (mean / std dev) for each data set Results: - Old approach: CoV = 0.5 - New approach: CoV = 40 This increase indicates a large *decrease* in the variance in the revenue of a single node across runs (which is what we want, more deterministic outcomes with a fixed seed).
To get some variety in our simulator, we update our RNG that's based on the same set seed to also use some data from nodes public keys. This will be the same across runs, but provides different values for each node.
b24bb78
to
082a6db
Compare
Closing due to (my own) inactivity, I'll open it back up when I've got some capacity for this. Would still like to find an approach that doesn't require carting a rng around our traits, but couldn't figure that out when I looked at it last. |
This PR updates #171 to create one RNG per event producer to decrease the variance across runs of the simulator when running with a fixed seed. This approach was originally taken in #171, but each RNG was seeded with the same value which led to nodes with the same capacity having the exact same payment flows (not what we want). I suggested that we share a RNG to get around that, but it introduces other problems - specifically that all tasks in the simulator are vying for mutex access to the same RNG, so the order of operations highly effects the determinism of the simulator.
This PR takes a new approach, which "salts" the seed with material from each node's public key to make the RNGs per-task deterministic but different.
Two design decisions that I wasn't totally happy with:
NetworkGenerator
because it holds the whole graph, so we can't pass a unique RNG in for each event producer. The solution here is to include the RNG in the API of the network generator in each call:PaymentGenerator
(though we could, because this one is generated per event producer).DefinedPaymentGenerator
implementation ofPaymentGenerator
where we don't actually need it.Option<u64>
and then salt it withu64
, this should probably by anOption<u64, u64>
because we don't need the salt if we don't have a seed, but it seems unnecessary to unwrap and re-wrap (got lazy, might fix).To get an idea of the improvement, I ran the simulator 50x with the two approaches and measured the coefficient of variance (mean / std dev) for the revenue that one node has in the simulation.
A larger number indicates that our variance is lower, so this is quite a dramatic improvement which IMO makes it worthwhile to make the API changes.